Development of a Speech Recognition System for Spanish Broadcast News
نویسندگان
چکیده
One of the ASR applications is the generation of transcripts to facilitate searching through multi-media collections containing spoken data. Especially in the broadcast news domain ASR systems have been successfully deployed to index large collections of news. First of all because retrieval performed on ASR generated transcripts with an word-error rate (WER) under 50% gives resonable results [1] and second because ASR systems nowdays achieve high performances on broadcastnews data WER rate below 10% are no longer unusual [2][3]. In the MESH project[4]whose goal is to extract, compare and combine multimedia content (audio, video and text) from multiple news sources ASR modules for three different languages (Spanish, German and English) are going to be integrated to generate transcripts of broadcast news data. This report presents the setup and evaluation of a speech recognition system for Spanish broadcast news. Section 3 gives a short overview about the general basic components of a ASR system. Section 4 decribes the development and training process of acoustic and language models for the Spanish ASR. The performance evaluation results are dicussed in section 5. The report ends with conclusions and future work suggestions.
منابع مشابه
Spanish broadcast news transcription
We describe the Sail Labs Media Mining System (MMS) aimed at the transcription of Castilian Spanish broadcastnews. In contrast to previous systems, the focus of this system is on Spanish as spoken on the Iberian Peninsula as opposed to the Americas. We discuss the development of a Castilian Spanish broadcast-news corpus suitable for training the various system components of the MMS and report o...
متن کاملReal-time live broadcast news subtitling system for Spanish
Subtitling of live broadcast news is a very important application to meet the needs of deaf and hard of hearing people. However, live subtitling is a high cost operation in terms of qualification human resources and thus, money if high precision is desired. Automatic Speech Recognition researchers can help to perform this task saving both time and money developing systems that delivers subtitle...
متن کاملStatistical Machine Translation of Broadcast News from Spanish to Portuguese
In this paper we describe the work carried out to develop an automatic system for translation of broadcast news from Spanish to Portuguese. Two challenging topics of speech and language processing were involved: Automatic Speech Recognition (ASR) of the Spanish News and Statistical Machine Translation (SMT) of the results to the Portuguese language. ASR of broadcast news is based on the AUDIMUS...
متن کاملSpoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملThe L2F Broadcast News Speech Recognition System
Broadcast news play an important role in our lives providing access to news, information and entertainment. The existence of an automatic transcription is an important medium that not only can provide subtitles for inclusion of people with special needs or be an advantage on noisy and populated environments, but also because it enables data search and retrieve capabilities over the multimedia s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008